Molpath Tumor Only Workflow¶

MolpathTumorOnlyWorkflow · 1 contributor · 1 version

No documentation was provided: contribute one

Quickstart¶

from janis_bioinformatics.tools.pmac.molpathTumorOnlyWorkflow import MolpathTumorOnly_1_0_0

wf = WorkflowBuilder("myworkflow")

wf.step(
    "molpathtumoronlyworkflow_step",
    MolpathTumorOnly_1_0_0(
        sample_name=None,
        fastqs=None,
        seqrun=None,
        reference=None,
        region_bed=None,
        region_bed_extended=None,
        region_bed_annotated=None,
        genecoverage_bed=None,
        genome_file=None,
        panel_name=None,
        vcfcols=None,
        snps_dbsnp=None,
        snps_1000gp=None,
        known_indels=None,
        mills_indels=None,
        mutalyzer_server=None,
        pathos_db=None,
        maxRecordsInRam=None,
        gnomad=None,
    )
)
wf.output("fastq_qc", source=molpathtumoronlyworkflow_step.fastq_qc)
wf.output("markdups_bam", source=molpathtumoronlyworkflow_step.markdups_bam)
wf.output("doc_out", source=molpathtumoronlyworkflow_step.doc_out)
wf.output("summary", source=molpathtumoronlyworkflow_step.summary)
wf.output("gene_summary", source=molpathtumoronlyworkflow_step.gene_summary)
wf.output("region_summary", source=molpathtumoronlyworkflow_step.region_summary)
wf.output("gridss_vcf", source=molpathtumoronlyworkflow_step.gridss_vcf)
wf.output("gridss_bam", source=molpathtumoronlyworkflow_step.gridss_bam)
wf.output("haplotypecaller_vcf", source=molpathtumoronlyworkflow_step.haplotypecaller_vcf)
wf.output("haplotypecaller_bam", source=molpathtumoronlyworkflow_step.haplotypecaller_bam)
wf.output("haplotypecaller_norm", source=molpathtumoronlyworkflow_step.haplotypecaller_norm)
wf.output("mutect2_vcf", source=molpathtumoronlyworkflow_step.mutect2_vcf)
wf.output("mutect2_bam", source=molpathtumoronlyworkflow_step.mutect2_bam)
wf.output("mutect2_norm", source=molpathtumoronlyworkflow_step.mutect2_norm)
wf.output("addbamstats_vcf", source=molpathtumoronlyworkflow_step.addbamstats_vcf)

OR

Install Janis
Ensure Janis is configured to work with Docker or Singularity.
Ensure all reference files are available:

Note

More information about these inputs are available below.

Generate user input files for MolpathTumorOnlyWorkflow:

# user inputs
janis inputs MolpathTumorOnlyWorkflow > inputs.yaml

inputs.yaml

fastqs:
- - fastqs_0.fastq.gz
  - fastqs_1.fastq.gz
- - fastqs_0.fastq.gz
  - fastqs_1.fastq.gz
genecoverage_bed: genecoverage_bed.bed
genome_file: genome_file.txt
gnomad: gnomad.vcf.gz
known_indels: known_indels.vcf.gz
maxRecordsInRam: 0
mills_indels: mills_indels.vcf.gz
mutalyzer_server: <value>
panel_name: <value>
pathos_db: <value>
reference: reference.fasta
region_bed: region_bed.bed
region_bed_annotated: region_bed_annotated.bed
region_bed_extended: region_bed_extended.bed
sample_name: <value>
seqrun: <value>
snps_1000gp: snps_1000gp.vcf.gz
snps_dbsnp: snps_dbsnp.vcf.gz
vcfcols: vcfcols.txt

Run MolpathTumorOnlyWorkflow with:

janis run [...run options] \
    --inputs inputs.yaml \
    MolpathTumorOnlyWorkflow

Information¶

URL: No URL to the documentation was provided

ID:	`MolpathTumorOnlyWorkflow`
URL:	No URL to the documentation was provided
Versions:	v1.0.0
Authors:	Jiaan Yu
Citations:
Created:	2020-06-12
Updated:	2020-08-10

Outputs¶

name	type	documentation
fastq_qc	Array<Array<Zip>>
markdups_bam	IndexedBam
doc_out	TextFile
summary	csv
gene_summary	TextFile
region_summary	TextFile
gridss_vcf	VCF
gridss_bam	BAM
haplotypecaller_vcf	Gzipped<VCF>
haplotypecaller_bam	IndexedBam
haplotypecaller_norm	VCF
mutect2_vcf	Gzipped<VCF>
mutect2_bam	Optional<IndexedBam>
mutect2_norm	VCF
addbamstats_vcf	VCF

Workflow¶

Embedded Tools¶

FastQC	`fastqc/v0.11.5`
Parse FastQC Adaptors	`ParseFastqcAdaptors/v0.1.0`
Align and sort reads	`BwaAligner/1.0.0`
Merge and Mark Duplicates	`mergeAndMarkBams/4.1.3`
Annotate GATK3 DepthOfCoverage Workflow	`AnnotateDepthOfCoverage/v0.1.0`
Performance summary workflow (targeted bed)	`PerformanceSummaryTargeted/v0.1.0`
Gridss	`gridss/v2.6.2`
GATK Base Recalibration on Bam	`GATKBaseRecalBQSRWorkflow/4.1.3`
GATK4 Somatic Variant Caller for Tumour Only Samples with Targeted BED	`GATK4_SomaticVariantCallerTumorOnlyTargeted/v0.1.1`
GATK4: Haplotype Caller	`Gatk4HaplotypeCaller/4.1.3.0`
Split Multiple Alleles and Normalise Vcf	`SplitMultiAlleleNormaliseVcf/v0.5772`
Combine Variants	`combinevariants/0.0.8`
BGZip	`bgzip/1.9`
BCFTools: Sort	`bcftoolssort/v1.9`
UncompressArchive	`UncompressArchive/v1.0.0`
Annotate Bam Stats to Germline Vcf Workflow	`AddBamStatsGermline/v0.1.0`
Tabix	`tabix/1.2.1`
VcfLib: Vcf Length	`vcflength/v1.0.1`
VcfLib: Vcf Filter	`vcffilter/v1.0.1`

Additional configuration (inputs)¶

name	type	documentation
sample_name	String
fastqs	Array<FastqGzPair>
seqrun	String	SeqRun Name (for Vcf2Tsv)
reference	FastaWithIndexes
region_bed	bed
region_bed_extended	bed
region_bed_annotated	bed
genecoverage_bed	bed
genome_file	TextFile
panel_name	String
vcfcols	TextFile
snps_dbsnp	Gzipped<VCF>
snps_1000gp	Gzipped<VCF>
known_indels	Gzipped<VCF>
mills_indels	Gzipped<VCF>
mutalyzer_server	String
pathos_db	String
maxRecordsInRam	Integer
gnomad	Gzipped<VCF>
black_list	Optional<bed>
panel_of_normals	Optional<Gzipped<VCF>>
fastqc_threads	Optional<Integer>	(-t) Specifies the number of files which can be processed simultaneously. Each thread will be allocated 250MB of memory so you shouldn’t run more threads than your available memory will cope with, and not more than 6 threads on a 32 bit machine
align_and_sort_sortsam_tmpDir	Optional<String>	Undocumented option
gridss_tmpdir	Optional<String>
haplotype_caller_pairHmmImplementation	Optional<String>	The PairHMM implementation to use for genotype likelihood calculations. The various implementations balance a tradeoff of accuracy and runtime. The –pair-hmm-implementation argument is an enumerated type (Implementation), which can have one of the following values: EXACT;ORIGINAL;LOGLESS_CACHING;AVX_LOGLESS_CACHING;AVX_LOGLESS_CACHING_OMP;EXPERIMENTAL_FPGA_LOGLESS_CACHING;FASTEST_AVAILABLE. Implementation: FASTEST_AVAILABLE
combinevariants_type	Optional<String>	germline \| somatic
combinevariants_columns	Optional<Array<String>>	Columns to keep, seperated by space output vcf (unsorted)
filter_for_vcfs	Optional<String>
filter_variants_1_invert	Optional<Boolean>	(-v) inverts the filter, e.g. grep -v

Workflow Description Language¶

version development

import "tools/fastqc_v0_11_5.wdl" as F
import "tools/ParseFastqcAdaptors_v0_1_0.wdl" as P
import "tools/BwaAligner_1_0_0.wdl" as B
import "tools/mergeAndMarkBams_4_1_3.wdl" as M
import "tools/AnnotateDepthOfCoverage_v0_1_0.wdl" as A
import "tools/PerformanceSummaryTargeted_v0_1_0.wdl" as P2
import "tools/gridss_v2_6_2.wdl" as G
import "tools/GATKBaseRecalBQSRWorkflow_4_1_3.wdl" as G2
import "tools/GATK4_SomaticVariantCallerTumorOnlyTargeted_v0_1_1.wdl" as G3
import "tools/Gatk4HaplotypeCaller_4_1_3_0.wdl" as G4
import "tools/SplitMultiAlleleNormaliseVcf_v0_5772.wdl" as S
import "tools/combinevariants_0_0_8.wdl" as C
import "tools/bgzip_1_9.wdl" as B2
import "tools/bcftoolssort_v1_9.wdl" as B3
import "tools/UncompressArchive_v1_0_0.wdl" as U
import "tools/AddBamStatsGermline_v0_1_0.wdl" as A2
import "tools/tabix_1_2_1.wdl" as T
import "tools/vcflength_v1_0_1.wdl" as V
import "tools/vcffilter_v1_0_1.wdl" as V2

workflow MolpathTumorOnlyWorkflow {
  input {
    String sample_name
    Array[Array[File]] fastqs
    String seqrun
    File reference
    File reference_fai
    File reference_amb
    File reference_ann
    File reference_bwt
    File reference_pac
    File reference_sa
    File reference_dict
    File region_bed
    File region_bed_extended
    File region_bed_annotated
    File genecoverage_bed
    File genome_file
    String panel_name
    File vcfcols
    File? black_list
    File snps_dbsnp
    File snps_dbsnp_tbi
    File snps_1000gp
    File snps_1000gp_tbi
    File known_indels
    File known_indels_tbi
    File mills_indels
    File mills_indels_tbi
    String mutalyzer_server
    String pathos_db
    Int maxRecordsInRam
    File gnomad
    File gnomad_tbi
    File? panel_of_normals
    File? panel_of_normals_tbi
    Int? fastqc_threads = 4
    String? align_and_sort_sortsam_tmpDir = "."
    String? gridss_tmpdir = "."
    String? haplotype_caller_pairHmmImplementation = "LOGLESS_CACHING"
    String? combinevariants_type = "germline"
    Array[String]? combinevariants_columns = ["AD", "DP", "AF", "GT"]
    String? filter_for_vcfs = "length > 150"
    Boolean? filter_variants_1_invert = true
  }
  scatter (f in fastqs) {
     call F.fastqc as fastqc {
      input:
        reads=f,
        threads=select_first([fastqc_threads, 4])
    }
  }
  scatter (f in fastqc.datafile) {
     call P.ParseFastqcAdaptors as getfastqc_adapters {
      input:
        fastqc_datafiles=f
    }
  }
  scatter (Q in zip(fastqs, zip(getfastqc_adapters.adaptor_sequences, getfastqc_adapters.adaptor_sequences))) {
     call B.BwaAligner as align_and_sort {
      input:
        sample_name=sample_name,
        reference=reference,
        reference_fai=reference_fai,
        reference_amb=reference_amb,
        reference_ann=reference_ann,
        reference_bwt=reference_bwt,
        reference_pac=reference_pac,
        reference_sa=reference_sa,
        reference_dict=reference_dict,
        fastq=Q.left,
        cutadapt_adapter=Q.right.right,
        cutadapt_removeMiddle3Adapter=Q.right.right,
        sortsam_tmpDir=select_first([align_and_sort_sortsam_tmpDir, "."])
    }
  }
  call M.mergeAndMarkBams as merge_and_mark {
    input:
      bams=align_and_sort.out,
      bams_bai=align_and_sort.out_bai,
      maxRecordsInRam=maxRecordsInRam,
      sampleName=sample_name
  }
  call A.AnnotateDepthOfCoverage as annotate_doc {
    input:
      bam=merge_and_mark.out,
      bam_bai=merge_and_mark.out_bai,
      bed=region_bed_annotated,
      reference=reference,
      reference_fai=reference_fai,
      reference_amb=reference_amb,
      reference_ann=reference_ann,
      reference_bwt=reference_bwt,
      reference_pac=reference_pac,
      reference_sa=reference_sa,
      reference_dict=reference_dict,
      sample_name=sample_name
  }
  call P2.PerformanceSummaryTargeted as performance_summary {
    input:
      bam=merge_and_mark.out,
      bam_bai=merge_and_mark.out_bai,
      genecoverage_bed=genecoverage_bed,
      region_bed=region_bed,
      sample_name=sample_name,
      genome_file=genome_file
  }
  call G.gridss as gridss {
    input:
      bams=[merge_and_mark.out],
      bams_bai=[merge_and_mark.out_bai],
      reference=reference,
      reference_fai=reference_fai,
      reference_amb=reference_amb,
      reference_ann=reference_ann,
      reference_bwt=reference_bwt,
      reference_pac=reference_pac,
      reference_sa=reference_sa,
      reference_dict=reference_dict,
      blacklist=black_list,
      tmpdir=select_first([gridss_tmpdir, "."])
  }
  call G2.GATKBaseRecalBQSRWorkflow as bqsr {
    input:
      bam=merge_and_mark.out,
      bam_bai=merge_and_mark.out_bai,
      intervals=region_bed_extended,
      reference=reference,
      reference_fai=reference_fai,
      reference_amb=reference_amb,
      reference_ann=reference_ann,
      reference_bwt=reference_bwt,
      reference_pac=reference_pac,
      reference_sa=reference_sa,
      reference_dict=reference_dict,
      snps_dbsnp=snps_dbsnp,
      snps_dbsnp_tbi=snps_dbsnp_tbi,
      snps_1000gp=snps_1000gp,
      snps_1000gp_tbi=snps_1000gp_tbi,
      known_indels=known_indels,
      known_indels_tbi=known_indels_tbi,
      mills_indels=mills_indels,
      mills_indels_tbi=mills_indels_tbi
  }
  call G3.GATK4_SomaticVariantCallerTumorOnlyTargeted as mutect2 {
    input:
      bam=bqsr.out,
      bam_bai=bqsr.out_bai,
      intervals=region_bed_extended,
      reference=reference,
      reference_fai=reference_fai,
      reference_amb=reference_amb,
      reference_ann=reference_ann,
      reference_bwt=reference_bwt,
      reference_pac=reference_pac,
      reference_sa=reference_sa,
      reference_dict=reference_dict,
      gnomad=gnomad,
      gnomad_tbi=gnomad_tbi,
      panel_of_normals=panel_of_normals,
      panel_of_normals_tbi=panel_of_normals_tbi
  }
  call G4.Gatk4HaplotypeCaller as haplotype_caller {
    input:
      pairHmmImplementation=select_first([haplotype_caller_pairHmmImplementation, "LOGLESS_CACHING"]),
      inputRead=bqsr.out,
      inputRead_bai=bqsr.out_bai,
      reference=reference,
      reference_fai=reference_fai,
      reference_amb=reference_amb,
      reference_ann=reference_ann,
      reference_bwt=reference_bwt,
      reference_pac=reference_pac,
      reference_sa=reference_sa,
      reference_dict=reference_dict,
      dbsnp=snps_dbsnp,
      dbsnp_tbi=snps_dbsnp_tbi,
      intervals=region_bed_extended
  }
  call S.SplitMultiAlleleNormaliseVcf as splitnormalisevcf {
    input:
      compressedVcf=haplotype_caller.out,
      reference=reference,
      reference_fai=reference_fai,
      reference_amb=reference_amb,
      reference_ann=reference_ann,
      reference_bwt=reference_bwt,
      reference_pac=reference_pac,
      reference_sa=reference_sa,
      reference_dict=reference_dict
  }
  call C.combinevariants as combinevariants {
    input:
      vcfs=[splitnormalisevcf.out, mutect2.out],
      type=select_first([combinevariants_type, "germline"]),
      columns=select_first([combinevariants_columns, ["AD", "DP", "AF", "GT"]])
  }
  call B2.bgzip as compressvcf {
    input:
      file=combinevariants.out
  }
  call B3.bcftoolssort as sortvcf {
    input:
      vcf=compressvcf.out
  }
  call U.UncompressArchive as uncompressvcf {
    input:
      file=sortvcf.out
  }
  call A2.AddBamStatsGermline as addbamstats {
    input:
      bam=merge_and_mark.out,
      bam_bai=merge_and_mark.out_bai,
      vcf=uncompressvcf.out,
      reference=reference,
      reference_fai=reference_fai,
      reference_amb=reference_amb,
      reference_ann=reference_ann,
      reference_bwt=reference_bwt,
      reference_pac=reference_pac,
      reference_sa=reference_sa,
      reference_dict=reference_dict
  }
  call B2.bgzip as compressvcf2 {
    input:
      file=addbamstats.out
  }
  call T.tabix as tabixvcf {
    input:
      inp=compressvcf2.out
  }
  call V.vcflength as calculate_variant_length {
    input:
      vcf=tabixvcf.out
  }
  call V2.vcffilter as filter_variants_1_failed {
    input:
      vcf=calculate_variant_length.out,
      info_filter=select_first([filter_for_vcfs, "length > 150"])
  }
  call V2.vcffilter as filter_variants_1 {
    input:
      vcf=calculate_variant_length.out,
      info_filter=select_first([filter_for_vcfs, "length > 150"]),
      invert=select_first([filter_variants_1_invert, true])
  }
  output {
    Array[Array[File]] fastq_qc = fastqc.out
    File markdups_bam = merge_and_mark.out
    File markdups_bam_bai = merge_and_mark.out_bai
    File doc_out = annotate_doc.out
    File summary = performance_summary.out
    File gene_summary = performance_summary.geneFileOut
    File region_summary = performance_summary.regionFileOut
    File gridss_vcf = gridss.out
    File gridss_bam = gridss.assembly
    File haplotypecaller_vcf = haplotype_caller.out
    File haplotypecaller_vcf_tbi = haplotype_caller.out_tbi
    File haplotypecaller_bam = haplotype_caller.bam
    File haplotypecaller_bam_bai = haplotype_caller.bam_bai
    File haplotypecaller_norm = splitnormalisevcf.out
    File mutect2_vcf = mutect2.variants
    File mutect2_vcf_tbi = mutect2.variants_tbi
    File? mutect2_bam = mutect2.out_bam
    File? mutect2_bam_bai = mutect2.out_bam_bai
    File mutect2_norm = mutect2.out
    File addbamstats_vcf = addbamstats.out
  }
}

Common Workflow Language¶

#!/usr/bin/env cwl-runner
class: Workflow
cwlVersion: v1.2
label: Molpath Tumor Only Workflow

requirements:
- class: InlineJavascriptRequirement
- class: StepInputExpressionRequirement
- class: ScatterFeatureRequirement
- class: SubworkflowFeatureRequirement
- class: MultipleInputFeatureRequirement

inputs:
- id: sample_name
  type: string
- id: fastqs
  type:
    type: array
    items:
      type: array
      items: File
- id: seqrun
  doc: SeqRun Name (for Vcf2Tsv)
  type: string
- id: reference
  type: File
  secondaryFiles:
  - pattern: .fai
  - pattern: .amb
  - pattern: .ann
  - pattern: .bwt
  - pattern: .pac
  - pattern: .sa
  - pattern: ^.dict
- id: region_bed
  type: File
- id: region_bed_extended
  type: File
- id: region_bed_annotated
  type: File
- id: genecoverage_bed
  type: File
- id: genome_file
  type: File
- id: panel_name
  type: string
- id: vcfcols
  type: File
- id: black_list
  type:
  - File
  - 'null'
- id: snps_dbsnp
  type: File
  secondaryFiles:
  - pattern: .tbi
- id: snps_1000gp
  type: File
  secondaryFiles:
  - pattern: .tbi
- id: known_indels
  type: File
  secondaryFiles:
  - pattern: .tbi
- id: mills_indels
  type: File
  secondaryFiles:
  - pattern: .tbi
- id: mutalyzer_server
  type: string
- id: pathos_db
  type: string
- id: maxRecordsInRam
  type: int
- id: gnomad
  type: File
  secondaryFiles:
  - pattern: .tbi
- id: panel_of_normals
  type:
  - File
  - 'null'
  secondaryFiles:
  - pattern: .tbi
- id: fastqc_threads
  doc: |-
    (-t) Specifies the number of files which can be processed simultaneously. Each thread will be allocated 250MB of memory so you shouldn't run more threads than your available memory will cope with, and not more than 6 threads on a 32 bit machine
  type: int
  default: 4
- id: align_and_sort_sortsam_tmpDir
  doc: Undocumented option
  type: string
  default: .
- id: gridss_tmpdir
  type: string
  default: .
- id: haplotype_caller_pairHmmImplementation
  doc: |-
    The PairHMM implementation to use for genotype likelihood calculations. The various implementations balance a tradeoff of accuracy and runtime. The --pair-hmm-implementation argument is an enumerated type (Implementation), which can have one of the following values: EXACT;ORIGINAL;LOGLESS_CACHING;AVX_LOGLESS_CACHING;AVX_LOGLESS_CACHING_OMP;EXPERIMENTAL_FPGA_LOGLESS_CACHING;FASTEST_AVAILABLE. Implementation:  FASTEST_AVAILABLE
  type: string
  default: LOGLESS_CACHING
- id: combinevariants_type
  doc: germline | somatic
  type: string
  default: germline
- id: combinevariants_columns
  doc: Columns to keep, seperated by space output vcf (unsorted)
  type:
    type: array
    items: string
  default:
  - AD
  - DP
  - AF
  - GT
- id: filter_for_vcfs
  type: string
  default: length > 150
- id: filter_variants_1_invert
  doc: (-v) inverts the filter, e.g. grep -v
  type: boolean
  default: true

outputs:
- id: fastq_qc
  type:
    type: array
    items:
      type: array
      items: File
  outputSource: fastqc/out
- id: markdups_bam
  type: File
  secondaryFiles:
  - pattern: .bai
  outputSource: merge_and_mark/out
- id: doc_out
  type: File
  outputSource: annotate_doc/out
- id: summary
  type: File
  outputSource: performance_summary/out
- id: gene_summary
  type: File
  outputSource: performance_summary/geneFileOut
- id: region_summary
  type: File
  outputSource: performance_summary/regionFileOut
- id: gridss_vcf
  type: File
  outputSource: gridss/out
- id: gridss_bam
  type: File
  outputSource: gridss/assembly
- id: haplotypecaller_vcf
  type: File
  secondaryFiles:
  - pattern: .tbi
  outputSource: haplotype_caller/out
- id: haplotypecaller_bam
  type: File
  secondaryFiles:
  - pattern: .bai
  outputSource: haplotype_caller/bam
- id: haplotypecaller_norm
  type: File
  outputSource: splitnormalisevcf/out
- id: mutect2_vcf
  type: File
  secondaryFiles:
  - pattern: .tbi
  outputSource: mutect2/variants
- id: mutect2_bam
  type:
  - File
  - 'null'
  secondaryFiles:
  - pattern: .bai
  outputSource: mutect2/out_bam
- id: mutect2_norm
  type: File
  outputSource: mutect2/out
- id: addbamstats_vcf
  type: File
  outputSource: addbamstats/out

steps:
- id: fastqc
  label: FastQC
  in:
  - id: reads
    source: fastqs
  - id: threads
    source: fastqc_threads
  scatter:
  - reads
  run: tools/fastqc_v0_11_5.cwl
  out:
  - id: out
  - id: datafile
- id: getfastqc_adapters
  label: Parse FastQC Adaptors
  in:
  - id: fastqc_datafiles
    source: fastqc/datafile
  scatter:
  - fastqc_datafiles
  run: tools/ParseFastqcAdaptors_v0_1_0.cwl
  out:
  - id: adaptor_sequences
- id: align_and_sort
  label: Align and sort reads
  in:
  - id: sample_name
    source: sample_name
  - id: reference
    source: reference
  - id: fastq
    source: fastqs
  - id: cutadapt_adapter
    source: getfastqc_adapters/adaptor_sequences
  - id: cutadapt_removeMiddle3Adapter
    source: getfastqc_adapters/adaptor_sequences
  - id: sortsam_tmpDir
    source: align_and_sort_sortsam_tmpDir
  scatter:
  - fastq
  - cutadapt_adapter
  - cutadapt_removeMiddle3Adapter
  scatterMethod: dotproduct
  run: tools/BwaAligner_1_0_0.cwl
  out:
  - id: out
- id: merge_and_mark
  label: Merge and Mark Duplicates
  in:
  - id: bams
    source: align_and_sort/out
  - id: maxRecordsInRam
    source: maxRecordsInRam
  - id: sampleName
    source: sample_name
  run: tools/mergeAndMarkBams_4_1_3.cwl
  out:
  - id: out
- id: annotate_doc
  label: Annotate GATK3 DepthOfCoverage Workflow
  in:
  - id: bam
    source: merge_and_mark/out
  - id: bed
    source: region_bed_annotated
  - id: reference
    source: reference
  - id: sample_name
    source: sample_name
  run: tools/AnnotateDepthOfCoverage_v0_1_0.cwl
  out:
  - id: out
  - id: out_sample_summary
- id: performance_summary
  label: Performance summary workflow (targeted bed)
  in:
  - id: bam
    source: merge_and_mark/out
  - id: genecoverage_bed
    source: genecoverage_bed
  - id: region_bed
    source: region_bed
  - id: sample_name
    source: sample_name
  - id: genome_file
    source: genome_file
  run: tools/PerformanceSummaryTargeted_v0_1_0.cwl
  out:
  - id: out
  - id: geneFileOut
  - id: regionFileOut
- id: gridss
  label: Gridss
  in:
  - id: bams
    source:
    - merge_and_mark/out
    linkMerge: merge_nested
  - id: reference
    source: reference
  - id: blacklist
    source: black_list
  - id: tmpdir
    source: gridss_tmpdir
  run: tools/gridss_v2_6_2.cwl
  out:
  - id: out
  - id: assembly
- id: bqsr
  label: GATK Base Recalibration on Bam
  in:
  - id: bam
    source: merge_and_mark/out
  - id: intervals
    source: region_bed_extended
  - id: reference
    source: reference
  - id: snps_dbsnp
    source: snps_dbsnp
  - id: snps_1000gp
    source: snps_1000gp
  - id: known_indels
    source: known_indels
  - id: mills_indels
    source: mills_indels
  run: tools/GATKBaseRecalBQSRWorkflow_4_1_3.cwl
  out:
  - id: out
- id: mutect2
  label: GATK4 Somatic Variant Caller for Tumour Only Samples with Targeted BED
  in:
  - id: bam
    source: bqsr/out
  - id: intervals
    source: region_bed_extended
  - id: reference
    source: reference
  - id: gnomad
    source: gnomad
  - id: panel_of_normals
    source: panel_of_normals
  run: tools/GATK4_SomaticVariantCallerTumorOnlyTargeted_v0_1_1.cwl
  out:
  - id: variants
  - id: out_bam
  - id: out
- id: haplotype_caller
  label: 'GATK4: Haplotype Caller'
  in:
  - id: pairHmmImplementation
    source: haplotype_caller_pairHmmImplementation
  - id: inputRead
    source: bqsr/out
  - id: reference
    source: reference
  - id: dbsnp
    source: snps_dbsnp
  - id: intervals
    source: region_bed_extended
  run: tools/Gatk4HaplotypeCaller_4_1_3_0.cwl
  out:
  - id: out
  - id: bam
- id: splitnormalisevcf
  label: Split Multiple Alleles and Normalise Vcf
  in:
  - id: compressedVcf
    source: haplotype_caller/out
  - id: reference
    source: reference
  run: tools/SplitMultiAlleleNormaliseVcf_v0_5772.cwl
  out:
  - id: out
- id: combinevariants
  label: Combine Variants
  in:
  - id: vcfs
    source:
    - splitnormalisevcf/out
    - mutect2/out
  - id: type
    source: combinevariants_type
  - id: columns
    source: combinevariants_columns
  run: tools/combinevariants_0_0_8.cwl
  out:
  - id: out
- id: compressvcf
  label: BGZip
  in:
  - id: file
    source: combinevariants/out
  run: tools/bgzip_1_9.cwl
  out:
  - id: out
- id: sortvcf
  label: 'BCFTools: Sort'
  in:
  - id: vcf
    source: compressvcf/out
  run: tools/bcftoolssort_v1_9.cwl
  out:
  - id: out
- id: uncompressvcf
  label: UncompressArchive
  in:
  - id: file
    source: sortvcf/out
  run: tools/UncompressArchive_v1_0_0.cwl
  out:
  - id: out
- id: addbamstats
  label: Annotate Bam Stats to Germline Vcf Workflow
  in:
  - id: bam
    source: merge_and_mark/out
  - id: vcf
    source: uncompressvcf/out
  - id: reference
    source: reference
  run: tools/AddBamStatsGermline_v0_1_0.cwl
  out:
  - id: out
- id: compressvcf2
  label: BGZip
  in:
  - id: file
    source: addbamstats/out
  run: tools/bgzip_1_9.cwl
  out:
  - id: out
- id: tabixvcf
  label: Tabix
  in:
  - id: inp
    source: compressvcf2/out
  run: tools/tabix_1_2_1.cwl
  out:
  - id: out
- id: calculate_variant_length
  label: 'VcfLib: Vcf Length'
  doc: Add the length column for the output of AddBamStats
  in:
  - id: vcf
    source: tabixvcf/out
  run: tools/vcflength_v1_0_1.cwl
  out:
  - id: out
- id: filter_variants_1_failed
  label: 'VcfLib: Vcf Filter'
  in:
  - id: vcf
    source: calculate_variant_length/out
  - id: info_filter
    source: filter_for_vcfs
  run: tools/vcffilter_v1_0_1.cwl
  out:
  - id: out
- id: filter_variants_1
  label: 'VcfLib: Vcf Filter'
  in:
  - id: vcf
    source: calculate_variant_length/out
  - id: info_filter
    source: filter_for_vcfs
  - id: invert
    source: filter_variants_1_invert
  run: tools/vcffilter_v1_0_1.cwl
  out:
  - id: out
id: MolpathTumorOnlyWorkflow